AVIVA: insuring people since 1696
CUSTOMERS: ca. 15M in the UK alone
QUANTUM: 600+ data practitioners globally
CUSTOMER SCIENCE: “Know customers, take better actions”
AVIVA: insuring people since 1696
CUSTOMERS: ca. 15M in the UK alone
QUANTUM: 600+ data practitioners globally
CUSTOMER SCIENCE: “Know customers, take better actions”
Can be done in a variety of ways
fastText (Bojanowski et al. 2016) is one particularly useful algorithm:
fasttext Python library in R is easy, thanks to reticulate# Install `fasttext` first (see https://fasttext.cc/docs/en/support.html)
# Load the `reticulate` package
require(reticulate)
# Make sure `fasttext` is available to R:
py_module_available("fasttext")
## [1] TRUE
# Load `fasttext`:
ft <- import("fasttext")
# Then call the required methods using the `$` notation, e.g.: `ft$train_supervised`
A sample of 200,000 unique UAS from the whatismybrowser.com database
m_unsup <- ft$train_unsupervised(input = "./data/train_data_unsup.txt",
model = "skipgram",
lr = 0.05,
dim = 32L, # vector dimension
ws = 3L,
minCount = 1L,
minn = 2L,
maxn = 6L,
neg = 3L,
wordNgrams = 2L,
loss = "ns",
epoch = 100L,
thread = 10L)
test_data <- readLines("./data/test_data_unsup.txt")
emb_unsup <- test_data %>%
lapply(., function(x) {
m_unsup$get_sentence_vector(text = x) %>% # returns average vector for a UAS
t(.) %>% as.data.frame(.)
}) %>%
bind_rows(.) %>%
setNames(., paste0("f", 1:32))
emb_unsup[1:3, 1:10]
## f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 ## 1 0.197 -0.03726 0.147 0.153 0.0423 0.0488 0.0196 0.132 0.1946 0.186 ## 2 0.182 0.00307 0.147 0.101 0.0326 0.0847 -0.0174 0.108 0.1957 0.171 ## 3 0.101 -0.28220 0.189 0.202 -0.1623 0.2622 0.1386 0.106 0.0733 -0.035
m_sup <- ft$train_supervised(input = "./data/train_data_sup.txt",
lr = 0.05,
dim = 32L, # vector dimension
ws = 3L,
minCount = 1L,
minCountLabel = 10L, # min label occurence
minn = 2L,
maxn = 6L,
neg = 3L,
wordNgrams = 2L,
loss = "softmax", # loss function
epoch = 100L,
thread = 10L)
GitHub: ranalytics/uas_embeddings
Connect: linkedin.com/in/mastitsky